18. Implementation
Implementation: Policy Iteration
In the previous concept, you learned about policy iteration, which proceeds as a series of alternating policy evaluation and improvement steps. Policy iteration is guaranteed to find the optimal policy for any finite Markov decision process (MDP) in a finite number of iterations. The pseudocode can be found below.

Please use the next concept to complete Part 4: Policy Iteration of Dynamic_Programming.ipynb
. Remember to save your work!
If you'd like to reference the pseudocode while working on the notebook, you are encouraged to open this sheet in a new window.
Feel free to check your solution by looking at the corresponding section in Dynamic_Programming_Solution.ipynb
.